Provably Optimal Algorithms for Generalized Linear Contextual Bandits
ثبت نشده
چکیده
(✓ 1 ✓ 2 ) := F ( ̄ ✓)(✓ 1 ✓ 2 ) (17) Since μ̇ > 0 and min (V ) > 0, we have (✓ 1 ✓ 2 ) 0 (G(✓ 1 ) G(✓ 2 )) (✓ 1 ✓ 2 ) 0 (V )(✓ 1 ✓ 2 ) > 0 for any ✓ 1 6= ✓ 2 . Hence, G(✓) is an injection from Rd to Rd, and so G 1 is a well-defined function. Consequently, (15) has a unique solution ˆ ✓ = G (Z). Let us consider an ⌘-neighborhood of ✓⇤, B⌘ := {✓ : k✓ ✓⇤k ⌘}, where ⌘ > 0 is a constant that will be specified later. Note that B⌘ is a convex set, thus ̄ ✓ 2 B⌘ as long as ✓1, ✓2 2 B⌘ . Define ⌘ := inf✓2B⌘ μ̇(x0✓) > 0. From (17), for any ✓ 2 B⌘ , kG(✓)k2V 1 = kG(✓) G(✓ ⇤ )k2V 1 = (✓ ✓⇤)0F ( ̄ ✓)V 1F ( ̄ ✓)(✓ ✓⇤) 2⌘ min(V ) k✓ ✓⇤k 2 , where the last inequality is due to the fact that F ( ̄ ✓) ⌫ ⌘V . On the other hand, Lemma A of Chen et al. (1999) implies that
منابع مشابه
Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose ...
متن کاملFairness in Learning: Classic and Contextual Bandits
We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on “chained” confidence intervals, and prove a cumulative r...
متن کاملLipschitz Bandits: Regret Lower Bound and Optimal Algorithms
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of t...
متن کاملLinear Contextual Bandits with Knapsacks
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...
متن کاملLinear Contextual Bandits with Global Constraints and Objective
We consider the linear contextual bandit problem with global convex constraints and a concaveobjective function. In each round, the outcome of pulling an arm is a vector, that depends linearly onthe context of that arm. The global constraints require the average of these vectors to lie in a certainconvex set. The objective is a concave function of this average vector. This probl...
متن کامل